Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?

  • Jump to comment-1
    bharath.rupireddyforpostgres@gmail.com2022-07-20T11:39:09+00:00
    Hi, After the commit [1], is it correct to say errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE))); in do_pg_backup_stop() when we hold the contents in backend global memory, not actually reading from backup_label file? However, it is correct to say that in read_backup_label. IMO, we can either say "invalid backup_label contents found" or we can be more descriptive and say "invalid "START WAL LOCATION" line found in backup_label content" and "invalid "BACKUP FROM" line found in backup_label content" and so on. Thoughts? [1] commit 39969e2a1e4d7f5a37f3ef37d53bbfe171e7d77a Author: Stephen Frost <sfrost@snowman.net> Date: Wed Apr 6 14:41:03 2022 -0400 Remove exclusive backup mode errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE))); Regards, Bharath Rupireddy.
    • Jump to comment-1
      horikyota.ntt@gmail.com2022-07-21T09:02:59+00:00
      At Wed, 20 Jul 2022 17:09:09 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > Hi, > > After the commit [1], is it correct to say errmsg("invalid data in file > \"%s\"", BACKUP_LABEL_FILE))); in do_pg_backup_stop() when we hold the > contents in backend global memory, not actually reading from backup_label > file? However, it is correct to say that in read_backup_label. > > IMO, we can either say "invalid backup_label contents found" or we can be > more descriptive and say "invalid "START WAL LOCATION" line found in > backup_label content" and "invalid "BACKUP FROM" line found in > backup_label content" and so on. > > Thoughts? Previously there the case the "char *labelfile" is loaded from a file, but currently it is alwasy a string build on the process. In that sense, nowadays it is a kind of internal error, which I think is not supposed to be exposed to users. So I think we can leave the code alone to avoid back-patching obstacles. But if we decided to change the code around, I'd like to change the string into a C struct, so that we don't need to parse it. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
      • Jump to comment-1
        bharath.rupireddyforpostgres@gmail.com2022-07-25T08:51:38+00:00
        On Thu, Jul 21, 2022 at 2:33 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Wed, 20 Jul 2022 17:09:09 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > > Hi, > > > > After the commit [1], is it correct to say errmsg("invalid data in file > > \"%s\"", BACKUP_LABEL_FILE))); in do_pg_backup_stop() when we hold the > > contents in backend global memory, not actually reading from backup_label > > file? However, it is correct to say that in read_backup_label. > > > > IMO, we can either say "invalid backup_label contents found" or we can be > > more descriptive and say "invalid "START WAL LOCATION" line found in > > backup_label content" and "invalid "BACKUP FROM" line found in > > backup_label content" and so on. > > > > Thoughts? > > Previously there the case the "char *labelfile" is loaded from a file, > but currently it is alwasy a string build on the process. In that > sense, nowadays it is a kind of internal error, which I think is not > supposed to be exposed to users. > > So I think we can leave the code alone to avoid back-patching > obstacles. But if we decided to change the code around, I'd like to > change the string into a C struct, so that we don't need to parse it. Hm. I think we must take this opportunity to clean it up. You are right, we don't need to parse the label file contents (just like we used to do previously after reading it from the file) in do_pg_backup_stop(), instead we can just pass a structure. Also, do_pg_backup_stop() isn't modifying any labelfile contents, but using startxlogfilename, startpoint and backupfrom from the labelfile contents. I think this information can easily be passed as a single structure. In fact, I might think a bit more here and wrap label_file, tblspc_map_file to a single structure something like below and pass it across the functions. typedef struct BackupState { StringInfo label_file; StringInfo tblspc_map_file; char startxlogfilename[MAXFNAMELEN]; XLogRecPtr startpoint; char backupfrom[20]; } BackupState; This way, the code is more readable, structured and we can remove 2 sscanf() calls, 2 "invalid data in file" errors, 1 strchr() call, 1 strstr() call. Only thing is that it creates code diff from the previous PG versions which is fine IMO. If okay, I'm happy to prepare a patch. Thoughts? Regards, Bharath Rupireddy.
        • Jump to comment-1
          horikyota.ntt@gmail.com2022-07-26T02:49:05+00:00
          At Mon, 25 Jul 2022 14:21:38 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > Hm. I think we must take this opportunity to clean it up. You are > right, we don't need to parse the label file contents (just like we > used to do previously after reading it from the file) in > do_pg_backup_stop(), instead we can just pass a structure. Also, > do_pg_backup_stop() isn't modifying any labelfile contents, but using > startxlogfilename, startpoint and backupfrom from the labelfile > contents. I think this information can easily be passed as a single > structure. In fact, I might think a bit more here and wrap label_file, > tblspc_map_file to a single structure something like below and pass it > across the functions. > > typedef struct BackupState > { > StringInfo label_file; > StringInfo tblspc_map_file; > char startxlogfilename[MAXFNAMELEN]; > XLogRecPtr startpoint; > char backupfrom[20]; > } BackupState; > > This way, the code is more readable, structured and we can remove 2 > sscanf() calls, 2 "invalid data in file" errors, 1 strchr() call, 1 > strstr() call. Only thing is that it creates code diff from the > previous PG versions which is fine IMO. If okay, I'm happy to prepare > a patch. > > Thoughts? It is more or less what was in my mind, but it seems that we don't need StringInfo there, or should avoid it to signal the strings are not editable. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
          • Jump to comment-1
            david@pgmasters.net2022-07-26T11:52:31+00:00
            On 7/25/22 22:49, Kyotaro Horiguchi wrote: > At Mon, 25 Jul 2022 14:21:38 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in >> Hm. I think we must take this opportunity to clean it up. You are >> right, we don't need to parse the label file contents (just like we >> used to do previously after reading it from the file) in >> do_pg_backup_stop(), instead we can just pass a structure. Also, >> do_pg_backup_stop() isn't modifying any labelfile contents, but using >> startxlogfilename, startpoint and backupfrom from the labelfile >> contents. I think this information can easily be passed as a single >> structure. In fact, I might think a bit more here and wrap label_file, >> tblspc_map_file to a single structure something like below and pass it >> across the functions. >> >> typedef struct BackupState >> { >> StringInfo label_file; >> StringInfo tblspc_map_file; >> char startxlogfilename[MAXFNAMELEN]; >> XLogRecPtr startpoint; >> char backupfrom[20]; >> } BackupState; >> >> This way, the code is more readable, structured and we can remove 2 >> sscanf() calls, 2 "invalid data in file" errors, 1 strchr() call, 1 >> strstr() call. Only thing is that it creates code diff from the >> previous PG versions which is fine IMO. If okay, I'm happy to prepare >> a patch. >> >> Thoughts? > > It is more or less what was in my mind, but it seems that we don't > need StringInfo there, or should avoid it to signal the strings are > not editable. I would prefer to have all the components of backup_label stored separately and then generate backup_label from them in pg_backup_stop(). For PG16 I am planning to add some fields to backup_label that are not known when pg_backup_start() is called, e.g. min recovery time. Regards, -David
            • Jump to comment-1
              bharath.rupireddyforpostgres@gmail.com2022-07-26T11:59:29+00:00
              On Tue, Jul 26, 2022 at 5:22 PM David Steele <david@pgmasters.net> wrote: > > On 7/25/22 22:49, Kyotaro Horiguchi wrote: > > At Mon, 25 Jul 2022 14:21:38 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > >> Hm. I think we must take this opportunity to clean it up. You are > >> right, we don't need to parse the label file contents (just like we > >> used to do previously after reading it from the file) in > >> do_pg_backup_stop(), instead we can just pass a structure. Also, > >> do_pg_backup_stop() isn't modifying any labelfile contents, but using > >> startxlogfilename, startpoint and backupfrom from the labelfile > >> contents. I think this information can easily be passed as a single > >> structure. In fact, I might think a bit more here and wrap label_file, > >> tblspc_map_file to a single structure something like below and pass it > >> across the functions. > >> > >> typedef struct BackupState > >> { > >> StringInfo label_file; > >> StringInfo tblspc_map_file; > >> char startxlogfilename[MAXFNAMELEN]; > >> XLogRecPtr startpoint; > >> char backupfrom[20]; > >> } BackupState; > >> > >> This way, the code is more readable, structured and we can remove 2 > >> sscanf() calls, 2 "invalid data in file" errors, 1 strchr() call, 1 > >> strstr() call. Only thing is that it creates code diff from the > >> previous PG versions which is fine IMO. If okay, I'm happy to prepare > >> a patch. > >> > >> Thoughts? > > > > It is more or less what was in my mind, but it seems that we don't > > need StringInfo there, or should avoid it to signal the strings are > > not editable. > > I would prefer to have all the components of backup_label stored > separately and then generate backup_label from them in pg_backup_stop(). +1, because pg_backup_stop is the one that's returning backup_label contents, so it does make sense for it to prepare it once and for all and return. > For PG16 I am planning to add some fields to backup_label that are not > known when pg_backup_start() is called, e.g. min recovery time. Can you please point to your patch that does above? Yes, right now, backup_label or tablespace_map contents are being filled in by pg_backup_start and are never changed again. But if your above proposal is for fixing some issue, then it would make sense for us to carry all the info in a structure to pg_backup_stop and then let it prepare the backup_label and tablespace_map contents. If the approach is okay for the hackers, I would like to spend time on it. Regards, Bharath Rupireddy.
              • Jump to comment-1
                david@pgmasters.net2022-07-26T12:20:49+00:00
                On 7/26/22 07:59, Bharath Rupireddy wrote: > On Tue, Jul 26, 2022 at 5:22 PM David Steele <david@pgmasters.net> wrote: >> >> On 7/25/22 22:49, Kyotaro Horiguchi wrote: >>> At Mon, 25 Jul 2022 14:21:38 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in >>>> Hm. I think we must take this opportunity to clean it up. You are >>>> right, we don't need to parse the label file contents (just like we >>>> used to do previously after reading it from the file) in >>>> do_pg_backup_stop(), instead we can just pass a structure. Also, >>>> do_pg_backup_stop() isn't modifying any labelfile contents, but using >>>> startxlogfilename, startpoint and backupfrom from the labelfile >>>> contents. I think this information can easily be passed as a single >>>> structure. In fact, I might think a bit more here and wrap label_file, >>>> tblspc_map_file to a single structure something like below and pass it >>>> across the functions. >>>> >>>> typedef struct BackupState >>>> { >>>> StringInfo label_file; >>>> StringInfo tblspc_map_file; >>>> char startxlogfilename[MAXFNAMELEN]; >>>> XLogRecPtr startpoint; >>>> char backupfrom[20]; >>>> } BackupState; >>>> >>>> This way, the code is more readable, structured and we can remove 2 >>>> sscanf() calls, 2 "invalid data in file" errors, 1 strchr() call, 1 >>>> strstr() call. Only thing is that it creates code diff from the >>>> previous PG versions which is fine IMO. If okay, I'm happy to prepare >>>> a patch. >>>> >>>> Thoughts? >>> >>> It is more or less what was in my mind, but it seems that we don't >>> need StringInfo there, or should avoid it to signal the strings are >>> not editable. >> >> I would prefer to have all the components of backup_label stored >> separately and then generate backup_label from them in pg_backup_stop(). > > +1, because pg_backup_stop is the one that's returning backup_label > contents, so it does make sense for it to prepare it once and for all > and return. > >> For PG16 I am planning to add some fields to backup_label that are not >> known when pg_backup_start() is called, e.g. min recovery time. > > Can you please point to your patch that does above? Currently it is a plan, not a patch. So there is nothing to show yet. > Yes, right now, backup_label or tablespace_map contents are being > filled in by pg_backup_start and are never changed again. But if your > above proposal is for fixing some issue, then it would make sense for > us to carry all the info in a structure to pg_backup_stop and then let > it prepare the backup_label and tablespace_map contents. I think this makes sense even if I don't get these changes into PG16. > If the approach is okay for the hackers, I would like to spend time on it. +1 from me. Regards, -David
                • Jump to comment-1
                  bharath.rupireddyforpostgres@gmail.com2022-07-30T00:07:23+00:00
                  On Tue, Jul 26, 2022 at 5:50 PM David Steele <david@pgmasters.net> wrote: > > >> I would prefer to have all the components of backup_label stored > >> separately and then generate backup_label from them in pg_backup_stop(). > > > > +1, because pg_backup_stop is the one that's returning backup_label > > contents, so it does make sense for it to prepare it once and for all > > and return. > > > >> For PG16 I am planning to add some fields to backup_label that are not > >> known when pg_backup_start() is called, e.g. min recovery time. > > > > Can you please point to your patch that does above? > > Currently it is a plan, not a patch. So there is nothing to show yet. > > > Yes, right now, backup_label or tablespace_map contents are being > > filled in by pg_backup_start and are never changed again. But if your > > above proposal is for fixing some issue, then it would make sense for > > us to carry all the info in a structure to pg_backup_stop and then let > > it prepare the backup_label and tablespace_map contents. > > I think this makes sense even if I don't get these changes into PG16. > > > If the approach is okay for the hackers, I would like to spend time on it. > > +1 from me. Here comes the v1 patch. This patch tries to refactor backup related code, advantages of doing so are following: 1) backup state is more structured now - all in a single structure, callers can create backup_label contents whenever required, either during the pg_backup_start or the pg_backup_stop or in between. 2) no parsing of backup_label file lines now in pg_backup_stop, no error checking for invalid parsing. 3) backup_label and history file contents have most of the things in common, they can now be created within a single function. 4) makes backup related code extensible and readable. One downside is that it creates a lot of diff with previous versions. Please review. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/
                  • Jump to comment-1
                    bharath.rupireddyforpostgres@gmail.com2022-08-08T13:50:31+00:00
                    On Sat, Jul 30, 2022 at 5:37 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Tue, Jul 26, 2022 at 5:50 PM David Steele <david@pgmasters.net> wrote: > > > > >> I would prefer to have all the components of backup_label stored > > >> separately and then generate backup_label from them in pg_backup_stop(). > > > > > > +1, because pg_backup_stop is the one that's returning backup_label > > > contents, so it does make sense for it to prepare it once and for all > > > and return. > > > > > >> For PG16 I am planning to add some fields to backup_label that are not > > >> known when pg_backup_start() is called, e.g. min recovery time. > > > > > > Can you please point to your patch that does above? > > > > Currently it is a plan, not a patch. So there is nothing to show yet. > > > > > Yes, right now, backup_label or tablespace_map contents are being > > > filled in by pg_backup_start and are never changed again. But if your > > > above proposal is for fixing some issue, then it would make sense for > > > us to carry all the info in a structure to pg_backup_stop and then let > > > it prepare the backup_label and tablespace_map contents. > > > > I think this makes sense even if I don't get these changes into PG16. > > > > > If the approach is okay for the hackers, I would like to spend time on it. > > > > +1 from me. > > Here comes the v1 patch. This patch tries to refactor backup related > code, advantages of doing so are following: > > 1) backup state is more structured now - all in a single structure, > callers can create backup_label contents whenever required, either > during the pg_backup_start or the pg_backup_stop or in between. > 2) no parsing of backup_label file lines now in pg_backup_stop, no > error checking for invalid parsing. > 3) backup_label and history file contents have most of the things in > common, they can now be created within a single function. > 4) makes backup related code extensible and readable. > > One downside is that it creates a lot of diff with previous versions. > > Please review. I added this to current CF - https://commitfest.postgresql.org/39/3808/ -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/